Systems and Method for Analyzing and Validating Invoices

ABSTRACT

A system and method for management and processing a plurality of types of invoices at a user&#39;s site involving importing the plurality of types of invoices to provide comparable invoices and auditing the comparable invoices by performing an automated reasonability test on the comparable invoices. The system and method also provide a means for approving, processing and reporting on the comparable invoices.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a system and method for electronicallyprocessing and validating a plurality of types of invoices.

2. Background Art

The traditional methods of collecting, reviewing and validating vendors'invoices, especially periodic invoices, e.g., telecommunications andutility bills, are a manual process. These methods impose substantialdifficulties for users having large volumes of such invoices. This isespecially true when there are multiple vendor invoices.

Despite the fact that, for example, telecom invoices are often receivedvia Electronic Data Interchange (EDI), many vendors still provide onlypaper invoices. While paper invoices enable a vendor to provide billinginformation to any customer regardless of their technologyinfrastructure, this flexibility impedes customers from analyzing andauditing the billing information. While paper invoices may be scannedand converted into machine encoded text via optical characterrecognition, the billing components in the machine encoded text and therelationships between them are not in a form that can be analyzed.

Identification of the billing components is particularly difficultbecause invoices differ from vendor to vendor, and from billing platformto billing platform. Vendors may use different terminology to denote thesame billing components. Moreover, billing components may be arranged indifferent locations from invoice to invoice. Finally, even if thebilling components are in the same locations and referenced using thesame terminology, the relationships between the billing components maydefined differently from invoice to invoice. For example, one invoicemay include certain taxes as part of the total line charges but anotherinvoice may not include the taxes.

As a result of the structural differences between various invoices,users are typically forced to manually enter and audit the billinginformation for each invoice. Because of the large amount of billinginformation contained in an invoice, and the complicated billingcomponent relationships, users spend a substantial amount of timeentering and auditing invoices.

The problem is exacerbated when there are multiple invoices representingmultiple vendors and multiple billing platforms. For example, a customermay receive an invoice from Verizon, an invoice from Sprint, andwireless and MPLS invoices from AT&T. Each invoice may have differentbilling components, and the billing components may be arranged indifferent locations. Because the invoices are structured differently,users would have to spend significant time entering and auditing theinvoices. In addition to being cumbersome, the process would be highlyerror prone.

What is therefore needed is a system for automatically capturing andauditing billing information from invoices.

SUMMARY OF THE INVENTION

The current invention provides a system and a method that permits a userto electronically process and validate a plurality of types of invoices,particularly telecommunication and utility invoices. A type of invoiceincludes, but is not limited to, paper based invoices from a pluralityof vendors and billing platforms. A plurality means at least twodifferent types of invoices can be received. The system includes a meansfor processing a plurality of types of invoices and a means forperforming a validation test on the invoices at the user site. Morespecifically, this invention provides a system for processing aplurality of types of invoices received by a user from a plurality ofvendors.

Using the present invention, a user can (1) receive invoice informationcontained in a paper invoice from a vendor; (2) automatically processthe invoice information, resulting in either approval of the invoiceinformation or identification of billing exceptions. The advantages ofthe present invention over conventional systems and techniques arenumerous and include the following: (1) automatic paper invoiceprocessing thus increasing efficiency; (2) a drastic reduction in theadministrative costs and human resources needed for processing invoices;(3) a real time updating of vendor specific invoice rules and templatesand thus no out of date rules or templates for the user; (4) anelectronic data input to accounting systems, reducing invoiceinaccuracies; (5) facilitating the generation of a large number ofspecialized reports, including audit, summary and customizable (custom)reports, that will provide the user with valuable feedback on thetransactions that are processed through the system; and (6) an improvedway to communicate and provide feedback to the user regarding theinvoices received from the vendors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a data flow diagram which depicts the flow of data betweenmajor processes in the present system.

FIG. 2 illustrates a block diagram of the Optical Invoice Recognizer(OIR) engine.

FIG. 3 illustrates an example paper invoice.

FIG. 4 illustrates the second page of the paper invoice in FIG. 3.

FIG. 5 illustrates a XML file generated by the Optical InvoiceRecognizer (OIR) engine based on the example paper invoice of FIG. 3.

FIG. 6 is a flowchart of an illustrative method for verifying an invoicefor completeness and accuracy according to an embodiment of the presentinvention.

FIG. 7 illustrates a block diagram of an exemplary computer system onwhich the embodiments can be implemented

DESCRIPTION OF THE INVENTION

An embodiment of the present invention provides an Optical CharacterRecognition (OCR) engine, an Optical Invoice Recognition (OIR) enginethat includes a preprocessor and an analysis engine, and softwarethereof. In the detailed description that follows, references to “oneembodiment,” “an embodiment,” “an example embodiment,” etc., indicatethat the embodiment described may include a particular feature,structure, or characteristic, but every embodiment may not necessarilyinclude the particular feature, structure, or characteristic. Moreover,such phrases are not necessarily referring to the same embodiment.Further, when a particular feature, structure, or characteristic isdescribed in connection with an embodiment, it is submitted that it iswithin the knowledge of one skilled in the art to affect such feature,structure, or characteristic in connection with other embodimentswhether or not explicitly described.

The term “embodiments of the invention” does not require that allembodiments of the invention include the discussed feature, advantage ormode of operation. Alternate embodiments may be devised withoutdeparting from the scope of the invention, and well-known elements ofthe invention may not be described in detail or may be omitted so as notto obscure the relevant details of the invention. In addition, theterminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention.For example, as used herein, the singular forms “a,” “an” and “the” areintended to include the plural forms as well, unless the context clearlyindicates otherwise. It will be further understood that the terms“comprises,” “comprising,” “includes” and/or “including,” when usedherein, specify the presence of stated features, integers, steps,operations, elements, and/or components, but do not preclude thepresence or addition of one or more other features, integers, steps,operations, elements, components, and/or groups thereof.

FIG. 1 is a data flow diagram which depicts the flow of data betweenmajor processes in the present system. The system is made up of variousmodules that can receive inputs of vendor invoices and provide output toa user, a user database, a user human resource system, and a useraccounting system. A module is a component of the system that has apredefined set of inputs and outputs. These inputs and outputs can befrom or to the system or user.

The system includes means for: importing various types of paper invoicesto an Optical Character Recognition (OCR) engine 108 to provideequivalent machine encoded text versions of the invoices. The systemalso includes means for: importing machine encoded text representing ainvoice to an Optical Invoice Recognition (OIR) engine 112 to validatethe billing information contained in the invoice. OIR engine 112includes means for: locating and capturing billing components containedin the invoice, including, but not limited to, billing identifiers suchas phone numbers, circuit IDs, and meter IDs; charges such as servicecharges, usage Charges, usage amounts, taxes, and surcharges; andamounts such as quantities, minutes, messages, and kW. OIR engine 112also includes means for: validating, approving and processing theinvoice information. The following sections describe the various meansto accomplish these functions.

Diagram 100 includes invoices 102, image scanner 104, scanned imagefile(s) 106, OCR engine 108, machine encoded text 110, OIR engine 112,and validation result 114.

Invoices 102 include one or more paper invoices from one or morevendors. The invoices each include one or more billing components. Inthe case of telecom invoices, the billing components may represent phonenumbers, circuit IDs, service charges, usage charges, usage amounts,taxes, and surcharges associated with a client's services.

Billing components are associated with other billing components.Typically billing components are arranged hierarchically with respect toother billing components. For example, most telecommunication invoiceshave a summary level of charges that includes billing components likethe previous month's billing, the amount paid, late charges, and thecurrent month's charges. The next level of detail under the currentmonth's charges may include a summary of the charges by each billingidentifier. For example, there may be a summary of charges for eachphone number, circuit ID, device ID, or location ID. Below the summarycharges for each billing identifier is typically another level ofdetail. For example, in the case of a phone number there may be thetotal service charges, the total usage charges, and the total taxes.Finally, below each of these charges is typically another level ofdetail. For example, in the case of total taxes, there are federal,state, and county taxes. At the most granular level of detail there willbe usage details such as the actual call itself, including such detailsas the time of day, duration, called number, cost, etc.

As would be appreciated by a person of ordinary skill in the art,invoices are often different structurally from vendor to vendor and frombilling platform to billing platform. Specifically, invoices may varybased on the number of billing components, type of billing components,and the relationships between billing components, in case of vendors,invoices from AT&T may have a different number of billing componentscompared to invoices from Verizon. In the case of billing platforms,billing components in wireless invoices from AT&T may be located indifferent positions than billing components in MPLS invoices from AT&T.

Invoices 102 are processed by image scanner 104 to produce scanned imagefiles 106. Image scanner 104 is a device that optically scans images,printed text, handwriting, or an object, and converts it to a digitalimage. Scanned image files 106 are digital image representations ofinvoices 102. In an exemplary embodiment, scanned image files 106 areTagged Image File Format (TIFF) files. The Tagged Image File Format is afile format for storing images that is popular among graphic artists andthe publishing industry. However, as would be appreciated by a person ofordinary skill in the art, various other types of image file formatssuch as Joint Photographic Experts Group (JPEG) file format and thePortable Network Graphics (PNG) file format may be used to representscanned image files 106.

Scanned image files 106 are processed by Optical Character Recognition(OCR) engine 108. OCR engine 108 receives the scanned image files 106and produces machine encoded text 110. As would be appreciated by aperson of skill in the art, OCR is the mechanical or electronicconversion of scanned images of handwritten, typewritten or printed textinto machine encoded text. It is widely used as a form of data entryfrom some sort of original paper data source, whether documents, salesreceipts, mail, or any number of printed records.

In an exemplary embodiment, OCR engine 108 produces one or more PDFfiles of the invoices. The PDF files contain the machine encoded text110 generated by OCR engine 108. While the PDF file format may be usedto represent machine encoded text 110, as would be appreciated by aperson of skill in the art, various other file formats may be used torepresent machine encoded text 110. For example, plain text files, richtext files, etc. may be used to represent machine encoded text 110.

Machine encoded text 110 is processed by Optical Invoice Recognition(OIR) engine 112. Alternatively, machine encoded text 110 that does notcome from the scanning and OCR process may be inputted and processed byOIR engine 112. OIR engine 112 interprets the machine encoded text tocreate a hierarchy of billing information that is analyzed and validatedto produce a hierarchical validated invoice 114. Hierarchical validatedinvoice 114 indicates that the provided invoice is complete andaccurate. OIR engine 112 is described in further detail in FIG. 2 below.

FIG. 2 illustrates a block diagram of the Optical Invoice Recognizer(OIR) engine 112. OIR engine 112 is used to analyze and validate themachine encoded text of the paper invoices. In particular, OIR engine112 ensures that an invoice contains complete and accurate billingcomponents. OIR engine 112 receives machine encoded text and outputs ahierarchical validated invoice.

OIR engine 112 is made up of various modules and receives as inputmachine encoded text and outputs to a user or system a hierarchicalvalidated invoice. A module is a component of the system that has apredefined set of inputs and outputs. These inputs and outputs can befrom or to the system or user. The system includes means for: importingtypes of invoice information produced by OCR engine 108 and associatingthe information into a hierarchy and validating it.

OIR engine 112 includes a preprocessor 210 and an analysis engine 220.In addition, OIR engine 112 utilizes knowledge base 230. Knowledge base230 includes information associated with a plurality of vendors andbilling platforms. More specifically, each billing platform includestemplates 240 and rules 250, wherein each billing platform is associatedwith one of the plurality of vendors.

In an exemplary embodiment, preprocessor 210 receives machine encodedtext 110 generated by OCR engine 108. Preprocessor 210 identifies thevendor and billing platform associated with the machine encoded text ofthe invoice. In addition, preprocessor 210 locates and captures all ofthe billing components specific to that vendor and billing platform inthe machine encoded text. Preprocessor 210 not only captures all of thebilling components but also retains the associations between the billingcomponents.

In order to identify the billing components and the associations betweenthem, preprocessor 210 must first identify the vendor and billingplatform associated with the invoice. In particular, the machine encodedtext of the invoice is compared with a general knowledgebase that looksfor any number or combinations of words and phrases, the spatialrelationships of these words, and images. As would be appreciated by aperson of ordinary skill in the art, various pattern matching methodsmay be applied to the machine encoded text in order to determine thevendor and billing platform associated with the invoice.

Once the vendor and billing platform have been identified, preprocessor210 identifies and locates the billing components contained in theinvoice. Preprocessor 210 uses the identified vendor and billingplatform information to locate a vendor and billing platform specifictemplate 240 from knowledgebase 230. Template 240 represents ageneralized representation of an invoice that is specific to theidentified vendor and billing platform. As would be appreciated by aperson of ordinary skill in the art, various structures and formats maybe used to model template 240. For example, structured document formatssuch as XML may be used to model such templates.

Preprocessor 210 applies template 240 to the machine encoded text inorder to identify the billing components. As would be appreciated by aperson of ordinary skill in the art, various methods and techniques maybe used to apply the template to the machine encoded text in order toidentify the billing components and the relationships between saidbilling components. For example, various pattern matching rulescontained in the template may be used to determine which templateelements correspond to which billing components in the machine encodedtext representing the invoice. The patterns may range from tag names tovery complicated patterns that match very specific billing components ofthe machine encoded text representing the invoice.

Once template 240 has been applied to the machine encoded text,preprocessor 210 outputs a hierarchical data structure that contains allthe billing components and a unique tag number for each billingcomponent. Because the billing components are arranged in a hierarchicaldata structure, the relationships between the billing components arecaptured implicitly in the hierarchical data structure. In an exemplaryembodiment, the hierarchical data structure is represented as an XML(Extensible Markup Language) file. However, as would be appreciated by aperson of ordinary skill in the art, various structures and file formatsmay be used for the hierarchical data structure.

Analysis engine 220 receives the hierarchical data structure and outputsa hierarchical validated invoice. In other words, in the exemplaryembodiment, analysis engine 220 analyzes the XML invoice and validatesit by checking the included billing components for completeness andaccuracy.

In order to check for completeness and accuracy, certain billingcomponents should always be present for certain vendor invoices and forcertain billing platforms. In particular, in the majority oftelecommunication invoices the following components are captured at thehighest level: invoice date, due date, account number, remittanceinformation, total amount due, currently monthly charges, etc. At thenext level, there may be a check of whether the sum of the child billingcomponents are equal to their parent billing components. Every branch ofthe billing components is validated to make sure the calculationinvolving the child billing components equals the parent billingcomponents.

In order for analysis engine 220 to analyze and validate invoices theanalysis engine applies a set of rules 250 from the correspondingknowledge base 230. Rules 250 are a collection of vendor and billingplatform specific rules. A rule consists of a pattern that describes howthe rule can be applied to the hierarchical data structure and an actionthat describes what should be done when the rule is applied. Optionally,a rule can have further conditions that restrict the applicability ofthe rule. For example, the rule may only be applied if another rule haspreviously been applied. In an exemplary embodiment, rules 250 define animplicit strategy to exhaustively apply all the rules.

As would be appreciated by a person of ordinary skill in the art,various structures and formats may be used to represent rules 250. Inaddition, as would be appreciated by a person of ordinary skill in theart, various methods and techniques may be used to apply the rules tothe machine encoded text in order to analyze and validate the billingcomponents and the relationships between the billing components.

If there are any billing components that are not calculated properlyafter rules 250 have been applied, then analysis engine 220 knows thereis an issue with OCR engine 108 or the rules 250 in knowledge base 230are incomplete. In the case of a problem with OCR engine 108, there iseither a OCR problem with the parent billing component or one of itschild components. In the case of an incomplete knowledge base 230, thereis either a missing rule(s) or the rule(s) have been incorrectly definedfor the given vendor and billing platform. In either case, the uniquetag numbers associated with each billing component in the hierarchicaldata structure are flagged as needing to be corrected. This ensures thatit is easy and quick for a person to correct either a OCR problem orfurther train the knowledge base 230. Further training the knowledgebasemay include adding additional rules or correcting existing rules inrules 250 for the identified vendor and billing platform.

If the billing components are complete and accurate, then the invoice islikely correct. Analysis engine 220 generates a successful validationresult and the sends the hierarchical validated invoice to be importedand analyzed by other modules.

An example paper invoice is illustrated in FIGS. 3 and 4. The paperinvoice is a monthly gas and electric bill. FIG. 3 illustrates the firstpage of the invoice. FIG. 4 illustrates the second page of the invoice.

The invoice is composed of words, phrases and images. The number andcombination of the words, phrases and images, as well as the spatialrelationships between them, uniquely identifies a vendor and billingplatform with the invoice. In this case, the vendor is Public ServiceEnterprise Group (PSEG) and the billing platform is a monthly gas andelectric bill.

In FIG. 3, the invoice is divided into two sections. The left columncontains the vendor name (e.g. PSEG) and contact information. The rightcolumn contains the customer's account number, the invoice number and aseries of summary billing components (e.g. billing components 310-350).

In FIG. 4, the left column contains usage information. The right columncontains the billing components that comprise each summary billingcomponent in FIG. 3. For example, billing components 445 and 475comprise summary billing component 340.

As discussed above, in order to validate an invoice, preprocessor 210first identifies the vendor and billing platform associated with theinvoice. In the example invoice, the “PSEG” image and contactinformation in the left column, and the “PSEG” text in the right column,identifies the vendor as “PSEG”. The presence of “Gas” and “Electric” insummary billing components 430 and 440, respectively, identifies thebilling platform as a monthly gas and electric bill.

In order to ensure that the vendor and billing platform is accuratelyidentified, preprocessor 210 may apply a threshold test to potentialvendor and billing platform identifiers. In the example invoice,preprocessor 210 may require that 75% of the potential vendoridentifiers match “PSEG” before the vendor is identified as “PSEG”.

Once the vendor and billing platform is identified, preprocessor 210uses a vendor and billing platform specific template to locate thebilling components in the invoice. In FIG. 3, summary billing components310-350 would be identified. In FIG. 4, the billing components thatcomprise each summary billing component would be identified, e.g.billing components 405, 410, 420-440 and 450-470.

Preprocessor 210 then outputs a hierarchical data structure thatcontains the identified billing components. The hierarchical datastructure also stores the various relationships between the differentbilling components.

An example hierarchical data structure is illustrated in FIG. 5. FIG. 5shows the identified billing components from FIGS. 3 and 4 stored in aXML file. In addition to storing the billing components, the XML filecaptures the relationships between the various billing components. Forexample, billing component 340 is represented as XML element 510.Similarly, billing components 420, 425, 430, 435, 440, 450, 455, 460,and 465 are represented as XML elements 515, 520, 525, 530, 540, 545,550, 555 and 560, respectively.

As discussed above, analysis engine 220 analyzes the hierarchical datastructure in order to validate the invoice for completeness andaccuracy. In the case of FIG. 5, analysis engine 220 would confirm thatbilling components 310-350 are present in the XML file. Billingcomponents 310-350 represent summary data such as the current gas amount(e.g. 330), the current electric amount (e.g. 340) and the total amountdue (e.g. 350). Because the current gas amount and the current electricamount are necessary to compute the total amount due, both must bepresent in the XML file. Similarly, because the total amount due isnecessary for payment of the invoice, it must be present in the XMLfile.

In addition, analysis engine 220 validates the accuracy of the billingcomponents by applying vendor and billing platform specific rules. Forexample, billing component 350 (e.g. total amount due) must be equal tothe sum of billing components 310 (e.g. previous balance), 320 (e.g.previous payment), 330 (e.g. current gas amount) and 340 (e.g. currentelectric amount). Similarly, billing component 475 (or 340) must beequal to the sum of billing components 445 (e.g. delivery subtotal) and470 (e.g. supply subtotal).

Analysis engine 220 may also apply other rules to validate the accuracyof billing components. For example, billing component 450 (e.g. BGSCapacity Generation) is equal to billing component 480 (e.g. generationkW) multiplied by the rate per kW (e.g. $5.41822297).

FIG. 6 is a flowchart of an exemplary method 600 for verifying aninvoice for completeness and accuracy according to embodiments of thepresent invention. Other structural embodiments will be apparent topersons skilled in the relevant art(s) based on the followingdiscussion. The operations show FIG. 6 need not occur in the ordershown, nor does method 600 require all of the operations shown in FIG. 6be performed. The operations of FIG. 6 are described in detail below.

In step 610, machine encoded text 110 generated by OCR engine 108 ormachine encoded text inputted manually is analyzed to determine thevendor and billing platform associated with the invoice. In particular,the machine encoded text of the invoice is compared with a generalknowledgebase that looks for any number or combinations of words andphrases, the spatial relationships of these words, and images. As wouldbe appreciated by a person of ordinary skill in the art, various patternmatching methods may be used to determine the vendor and billingplatform associated with the invoice.

In step 612, once the vendor and billing platform is identified, theinvoice is analyzed to capture billing components specific to thatvendor and billing platform. In particular, OIR engine 112 looks up thevendor and billing platform specific template 240 and rules 250 inknowledge base 230 that are associated with the identified vendor andbilling platform. OIR engine 112 then applies template 240 in order tolocate and capture all the billing components. Each billing component isassigned a unique tag number. OIR engine 112 then stores the capturedbilling components in a hierarchical data structure such as an XML file.

In step 614, OIR engine 112 analyzes the hierarchical data structurerepresenting the invoice for completeness and accuracy. In particular,OIR engine 112 applies a collection of rules 250 for a specific vendorand billing platform stored in knowledge base 230 to the identifiedbilling components. The rules 250 define what billing components arerequired in the invoice and the relationships between the billingcomponents. For example, a rule might specify that the sum of theFederal, State, and local taxes billing components should equal theTotal taxes billing component. In another example, a rule might specifythat there must always be a Total Charges billing component present inthe invoice.

In step 616, if there no inaccurate or missing billing components thenoperation continues to step 624. Otherwise, the inaccurate or missingbilling components are flagged based on each billing components uniquetag number and operation continues at step 318.

In step 618, the user is presented with the flagged billing components.The billing components were flagged either because the vendor andbilling platform specific template and rules in knowledge base 230 needto be retrained or because of an OCR problem. If the vendor and billingplatform template and rules need to be retrained then operationcontinues to step 620. Otherwise, if the OCR recognition process wasproblematic then operation continues to step 622.

In step 620, knowledge base 230 has incomplete or inaccurate templatesor rules. The user, therefore, adds new or corrected information to theknowledge base. For example, new or corrected rules and templates may beadded to template 240 and rules 250 for the corresponding vendor andbilling platform in knowledge base 230. Operation then continues to step612 where the new or corrected information is applied to the invoice inorder correctly identify and analyze the billing components.

In step 622, OCR engine 108 produced an incorrect translation of theinvoice into machine encoded text. Therefore, the user either correctsthe machine encoded text directly or rescans/OCRs the invoice. Becausethe billing components are flagged, a user can often simply enter thecorrected invoice information directly. Operation then continues to step610 where the corrected machine encoded text is rerun through method300.

In step 624, OIR engine 112 produces a validation result of success andpresents the validated invoice to the user or other modules for furtherprocessing.

Example General Purpose Computer System

Embodiments presented herein, or portions thereof, can be implemented inhardware, firmware, software, and/or combinations thereof.

The embodiments presented herein apply to any communication systembetween two or more devices or within subcomponents of one device. Therepresentative functions described herein can be implemented inhardware, software, or some combination thereof. For instance, therepresentative functions can be implemented using computer processors,computer logic, application specific circuits (ASIC), digital signalprocessors, etc., as will be understood by those skilled in the artsbased on the discussion given herein. Accordingly, any processor thatperforms the functions described herein is within the scope and spiritof the embodiments presented herein.

The following describes a general purpose computer system that can beused to implement embodiments of the disclosure presented herein. Thepresent disclosure can be implemented in hardware, or as a combinationof software and hardware. Consequently, the disclosure may beimplemented in the environment of a computer system or other processingsystem. An example of such a computer system 700 is shown in FIG. 7. Thecomputer system 700 includes one or more processors, such as processor704. Processor 704 can be a special purpose or a general purpose digitalsignal processor. The processor 704 is connected to a communicationinfrastructure 702 (for example, a bus or network). Various softwareimplementations are described in terms of this exemplary computersystem. After reading this description, it will become apparent to aperson skilled in the relevant art how to implement the disclosure usingother computer systems and/or computer architectures.

Computer system 700 also includes a main memory 706, preferably randomaccess memory (RAM), and may also include a secondary memory 708.Secondary memory 708 may include, for example, a hard disk drive 710and/or a removable storage drive 712, representing a floppy disk drive,a magnetic tape drive, an optical disk drive, or the like. Removablestorage drive 712 reads from and/or writes to a removable storage unit716 in a well-known manner. Removable storage unit 716 represents afloppy disk, magnetic tape, optical disk, or the like, which is read byand written to by removable storage drive 712. As will be appreciated bypersons skilled in the relevant art(s), removable storage unit 716includes a computer usable storage medium having stored therein computersoftware and/or data.

In alternative implementations, secondary memory 708 may include othersimilar means for allowing computer programs or other instructions to beloaded into computer system 700. Such means may include, for example, aremovable storage unit 718 and an interface 714. Examples of such meansmay include a program cartridge and cartridge interface (such as thatfound in video game devices), a removable memory chip (such as an EPROM,or PROM) and associated socket, a thumb drive and USB port, and otherremovable storage units 718 and interfaces 714 which allow software anddata to be transferred from removable storage unit 718 to computersystem 700.

Computer system 700 may also include a communications interface 720.Communications interface 720 allows software and data to be transferredbetween computer system 700 and external devices. Examples ofcommunications interface 420 may include a modem, a network interface(such as an Ethernet card), a communications port, a PCMCIA slot andcard, etc. Software and data transferred via communications interface720 are in the form of signals which may be electronic, electromagnetic,optical, or other signals capable of being received by communicationsinterface 720. These signals are provided to communications interface720 via a communications path 722. Communications path 722 carriessignals and may be implemented using wire or cable, fiber optics, aphone line, a cellular phone link, an RF link and other communicationschannels.

As used herein, the terms “computer program medium” and “computerreadable medium” are used to generally refer to tangible storage mediasuch as removable storage units 716 and 718 or a hard disk installed inhard disk drive 710. These computer program products are means forproviding software to computer system 700.

Computer programs (also called computer control logic) are stored inmain memory 706 and/or secondary memory 708. Computer programs may alsobe received via communications interface 720. Such computer programs,when executed, enable the computer system 700 to implement the presentdisclosure as discussed herein. In particular, the computer programs,when executed, enable processor 704 to implement the processes of thepresent disclosure, such as any of the methods described herein.Accordingly, such computer programs represent controllers of thecomputer system 700. Where the disclosure is implemented using software,the software may be stored in a computer program product and loaded intocomputer system 700 using removable storage drive 712, interface 714, orcommunications interface 720.

In another embodiment, features of the disclosure are implementedprimarily in hardware using, for example, hardware components such asapplication-specific integrated circuits (ASICs) and gate arrays.Implementation of a hardware state machine so as to perform thefunctions described herein will also be apparent to persons skilled inthe relevant art(s).

CONCLUSION

While various embodiments have been described above, it should beunderstood that they have been presented by way of example, and notlimitation. It will be apparent to persons skilled in the relevant artthat various changes in form and detail can be made therein withoutdeparting from the spirit and scope of the embodiments presented herein.

The embodiments presented herein have been described above with the aidof functional building blocks and method steps illustrating theperformance of specified functions and relationships thereof. Theboundaries of these functional building blocks and method steps havebeen arbitrarily defined herein for the convenience of the description.Alternate boundaries can be defined so long as the specified functionsand relationships thereof are appropriately performed. Any suchalternate boundaries are thus within the scope and spirit of the claimedembodiments. One skilled in the art will recognize that these functionalbuilding blocks can be implemented by discrete components, applicationspecific integrated circuits, processors executing appropriate softwareand the like or any combination thereof. Thus, the breadth and scope ofthe present embodiments should not be limited by any of theabove-described exemplary embodiments, but should be defined only inaccordance with the following claims and their equivalents.

What is claimed is:
 1. A computer-readable storage device havingcomputer-executable instructions stored thereon, execution of which, bya computing device, causes the computing device to perform operationscomprising: identifying a vendor associated with a machine-encoded text;identifying a knowledge base associated with the vendor; extracting oneor more billing components from the machine-encoded text according tothe knowledge base; arranging the billing components in a hierarchicaldata structure; and validating the billing components arranged in thehierarchical data structure according to the knowledge base.
 2. Thecomputer-readable storage device of claim 1 further comprising receivinga scanned image of an invoice and converting the scanned image into themachine-encoded text.
 3. The computer-readable storage device of claim 1wherein the knowledge base comprises a template and a plurality ofrules.
 4. The computer-readable storage device of claim 3 wherein thetemplate is applied to the machine-encoded text in order to locate thebilling components.
 5. The computer-readable storage device of claim 3wherein the plurality of rules is applied to the hierarchical datastructure in order to validate the billing components.
 6. Thecomputer-readable storage device of claim 1 wherein the hierarchicaldata structure is a XML file.
 7. The computer-readable storage device ofclaim 1 wherein each of the billing components is either a parentbilling component or a child billing component.
 8. The computer-readablestorage device of claim 1 wherein the billing components are either acharge, usage, or quantity.
 9. A method of processing invoicescomprising: identifying a vendor associated with a machine-encoded text;identifying a knowledge base associated with the vendor; extracting oneor more billing components from the machine-encoded text according tothe knowledge base; arranging the billing components in a hierarchicaldata structure; and validating the billing components arranged in thehierarchical data structure according to the knowledge base.
 10. Themethod of claim 9 further comprising receiving a scanned image of aninvoice and converting the scanned image into the machine-encoded text.11. The method of claim 9 wherein the knowledge base comprises atemplate and a plurality of rules.
 12. The method of claim 11 whereinthe template is applied to the machine-encoded text in order to locatethe billing components.
 13. The method of claim 11 wherein the pluralityof rules is applied to the hierarchical data structure in order tovalidate the billing components.
 14. The method of claim 9 wherein thehierarchical data structure is a XML file.
 15. The method of claim 9wherein each of the billing components is either a parent billingcomponent or a child billing component.
 16. The method of claim 9wherein the billing components are either a charge, usage, or quantity.17. An invoice management system comprising: an optical invoicerecognition engine configured to: receive machine-encoded text; identifya vendor associated with the machine-encoded text; identify a knowledgebase associated with the vendor; extract one or more billing componentsfrom the machine-encoded text according to the knowledge base; andarrange the billing components in a hierarchical data structure; and ananalysis engine configured to: receive the hierarchical data structure;and validate the billing components arranged in the hierarchical datastructure according to the knowledge base.
 18. The invoice managementsystem of claim 1 further comprising an optical character recognitionengine configured to receive a scanned image of an invoice and convertthe scanned image into the machine-encoded text.
 19. The invoicemanagement system of claim 1 wherein the knowledge base comprises atemplate and a plurality of rules.
 20. The invoice management system ofclaim 3 wherein the template is applied to the machine-encoded text inorder to locate the billing components.
 21. The invoice managementsystem of claim 1 wherein the hierarchical data structure is a XML file.22. The invoice management system of claim 1 wherein each of the billingcomponents is either a parent billing component or a child billingcomponent.
 23. The invoice management system of claim 3 wherein theplurality of rules is applied to the hierarchical data structure inorder to validate the billing components.
 24. The invoice managementsystem of claim 1 wherein the billing components are either a charge,usage, or quantity.